首页> 外文OA文献 >Building competitive direct acoustics-to-word models for English conversational speech recognition
【2h】

Building competitive direct acoustics-to-word models for English conversational speech recognition

机译:为英语建立有竞争力的直接声学 - 单词模型   会话语音识别

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Direct acoustics-to-word (A2W) models in the end-to-end paradigm havereceived increasing attention compared to conventional sub-word based automaticspeech recognition models using phones, characters, or context-dependent hiddenMarkov model states. This is because A2W models recognize words from speechwithout any decoder, pronunciation lexicon, or externally-trained languagemodel, making training and decoding with such models simple. Prior work hasshown that A2W models require orders of magnitude more training data in orderto perform comparably to conventional models. Our work also showed thisaccuracy gap when using the English Switchboard-Fisher data set. This paperdescribes a recipe to train an A2W model that closes this gap and is at-parwith state-of-the-art sub-word based models. We achieve a word error rate of8.8%/13.9% on the Hub5-2000 Switchboard/CallHome test sets without any decoderor language model. We find that model initialization, training data order, andregularization have the most impact on the A2W model performance. Next, wepresent a joint word-character A2W model that learns to first spell the wordand then recognize it. This model provides a rich output to the user instead ofsimple word hypotheses, making it especially useful in the case of words unseenor rarely-seen during training.
机译:与使用电话,字符或上下文相关的隐马尔可夫模型状态的传统的基于子词的自动语音识别模型相比,端到端范式中的直接声音对单词(A2W)模型受到了越来越多的关注。这是因为A2W模型无需任何解码器,发音词典或外部训练的语言模型即可识别语音中的单词,从而简化了使用此类模型进行训练和解码的过程。先前的工作表明,A2W模型需要更多数量级的训练数据,才能与传统模型进行比较。我们的工作还显示了在使用英语总机-鱼类数据集时的这种准确性差距。本文介绍了一种训练A2W模型的方法,该模型可以缩小这一差距,并且与基于最新子词的模型非常相似。在没有任何解码器或语言模型的情况下,我们在Hub5-2000交换台/ CallHome测试装置上实现了8.8%/ 13.9%的字错误率。我们发现模型初始化,训练数据顺序和正则化对A2W模型的性能影响最大。接下来,我们介绍一个联合的单词-字符A2W模型,该模型学习首先拼写单词然后识别它。该模型为用户提供了丰富的输出,而不是简单的单词假设,从而使其在训练期间很少见到的单词看不见的情况下特别有用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号